Case Studies on Cache Performance and Optimization of Programs with Unit Strides

نویسندگان

Pei-Chi Wu

Kuo-Chan Huang

چکیده

Cache performance in modern computers is important for program efficiency. A cache is thrashing if a significant amount of time is spent moving data between the memory and the cache. This paper presents two cache thrashing examples, one in scientific computing and one in image processing, both of which involve several one-dimensional arrays that are accessed sequentially, i.e., with unit strides. Accessing arrays in unit strides was considered very efficient on cachebased computer systems. However, the existence of cache thrashing is demonstrated by significant increases in computing speed in the equivalent programs tuned for cache locality. This shows that accessing several arrays sequentially may cause cache thrashing. Thus, to improve cache performance, it is important that the compiler or the programmer takes all arrays inside a loop into consideration.  1997 by John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving cache locality for GPU-based volume rendering

We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically...

متن کامل

Design and Experience: Using the Intel® Itanium® 2 Processor Performance Monitoring Unit to Implement Feedback Optimizations

Historically, profile-guided optimization has gathered its profile data by executing an instrumented binary and capturing the output. While this approach enables the collection of function and basic block frequencies, it cannot extract microarchitectural event information such as cache activity, TLB activity, and branch prediction behavior. Using instrumentation also requires that programs be c...

متن کامل

Data Partitioning for a Good Node Performance

As a consequence of recent advances in interconnection network technology for MIMD parallel computers, optimizing communications in parallel programs has become a factor of secondary importance. For example, mapping processes onto processors is currently an issue of minor importance for some up-to-date distributed memory parallel computers, because the interconnection network guarantees a fairl...

متن کامل

A Profiling Tool for Detecting Cache-Critical Data Structures

A poor cache behavior can significantly prohibit achieving high speedup and scalability of parallel applications. This means optimizing a program with respect to cache locality can potentially introduce considerable performance gain. As a consequence, programmers usually perform cache locality optimization for acquiring the expected performance of their applications. Within this work, we develo...

متن کامل

Reduction of Cache Interference Misses through SelectiveBit - permutation

Cache miss rates have a large and increasing impact on overall performance. In this report, we address the problem of cache interference in regular numerical programs dominated by strided memory access patterns. In our scheme, the interfering strides in each region of memory may be annotated by the programmer, detected at compile time, or even at run time. The algorithm developed in this report...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Softw., Pract. Exper.

دوره 27 شماره

صفحات -

تاریخ انتشار 1997

Case Studies on Cache Performance and Optimization of Programs with Unit Strides

نویسندگان

چکیده

منابع مشابه

Improving cache locality for GPU-based volume rendering

Design and Experience: Using the Intel® Itanium® 2 Processor Performance Monitoring Unit to Implement Feedback Optimizations

Data Partitioning for a Good Node Performance

A Profiling Tool for Detecting Cache-Critical Data Structures

Reduction of Cache Interference Misses through SelectiveBit - permutation

عنوان ژورنال:

اشتراک گذاری